Let’s plot singletons:

## quartz_off_screen 
##                 2

alt text

## [1] "/Users/sarahurbut/gtexresults_matrixash/Analysis"

We can consider the number of statistics at each threshold using both methods:

thresh=0.05
sum(lfsr.mash<thresh)
## [1] 303809

Let’s also count how many associations we might count as inconsistent by simply counting the number of times the signs differed in a vector of posterior means for a given gene snp pair. This changes without brain, as with brain, there were 9597 inconsistent instances.

inconsistent.func=function(posterior.means,lfsr,thresh=0.05){
h=apply(posterior.means,1,function(p){
  pos=sum(p>0);neg=sum(p<0);pos*neg!=0})
sum(h=="TRUE")}
inconsistent.func(pm.mash.beta,lfsr.mash)
## [1] 7723
inconsistent.func(pm.ash.beta,lfsr.mash)
## [1] 14074

If we restrict our analysis to only those considered significant at an lfsr threshold using ash computed LFSR, we have a smaller number, but this is simply because so many fewer associations are called significant using ash. Using the MASH lfsrs, the results are identical to using the univariate z statistics. WE can compare to the matrix ash posterior means as well. This was 3180 (roughtly 20% of the genes) when using Brains.

 thresh=0.05



z=sapply(seq(1:nrow(pm.mash.beta)),function(x){
l=lfsr.mash[x,];p=pm.mash.beta[x,];plow=p[which(l<thresh)];##grab only those posterior means that are 'significant'
if(length(plow)==0){return("FALSE")}##for ones who show no significants, they can't be heterogenous
else{pos=sum(plow>0);neg=sum(plow<0);pos*neg!=0}})
sum(z==TRUE)
## [1] 2377

We can also consider a histogram of normalized posterior means and compute how often there is consistency in sign.

maxb=read.table("../Data/maxbetahats.txt")[,-c(7:16)]

bnorm=het.norm(effectsize = maxb)
sum(bnorm>0)/length(bnorm)
## [1] 0.726567
matrix.ash.norm=het.norm(effectsize = pm.mash.beta)
sum(matrix.ash.norm>0)/length(matrix.ash.norm)
## [1] 0.8797172
uni.ash.norm=het.norm(effectsize = pm.ash.beta)
sum(uni.ash.norm>0)/length(matrix.ash.norm)
## [1] 0.7299788
pm.mash.beta.brain=read.table("../Data//Aug13withEDposterior.betas.txt")
matrix.ash.norm.brain=het.norm(effectsize = pm.mash.beta.brain)
sum(matrix.ash.norm.brain>0)/length(matrix.ash.norm.brain)
## [1] 0.8330297
ash.norm=het.norm(effectsize = pm.ash.beta)

We might also be interested in the proportion that are greater than a particular threshold of max effect. This

sum(bnorm>0.5)/length(bnorm)
## [1] 0.2141482
sum(uni.ash.norm>0.5)/length(bnorm)
## [1] 0.1382311
sum(matrix.ash.norm>0.5)/length(matrix.ash.norm)
## [1] 0.4497242
sum(matrix.ash.norm.brain>0.5)/length(matrix.ash.norm.brain)
## [1] 0.3540626

Now, let’s plot:

par(mfrow=c(1,3))

hist(matrix.ash.norm,freq=FALSE,ylim=c(0,5),nclass=100,main="B_jr|Data/B_jr[which.max(abs(B_jr|Data))],MASHNOBRAIN")
legend("left",legend=paste0("mash.norm<0=",round((sum(matrix.ash.norm<0)/length(matrix.ash.norm)),3)))
hist(matrix.ash.norm.brain,freq=FALSE,ylim=c(0,5),nclass=100,main="WITHBRAIN,MASH")
legend("left",legend=paste0("mash.norm<0=",round((sum(matrix.ash.norm.brain<0)/length(matrix.ash.norm.brain)),3)))
hist(bnorm,freq=FALSE,ylim=c(0,5),nclass=100,main="B_jr|Data/B_jr[which.max(abs(B_jr|Data))],B_raw")
legend("left",legend=paste0("b.norm<0=",round((sum(bnorm<0)/length(matrix.ash.norm)),3)))

mash.sign=sign.norm(effectsize = pm.mash.beta)
mash.sign.brain=sign.norm(effectsize = pm.mash.beta.brain)

b.sign=sign.norm(effectsize = maxb)

par(mfrow=c(1,3))
hist(mash.sign,freq=FALSE,ylim=c(0,10),main="B_jr|Data/sign(B_jr[which.max(abs(B_jr|Data))],MASH",nclass=50)
legend("left",legend=paste0("mash.sign<0=",round((sum(matrix.ash.norm<0)/length(matrix.ash.norm)),3)))
hist(mash.sign.brain,freq=FALSE,ylim=c(0,10),main="WITHBRAIN,MASH",nclass=50)
legend("left",legend=paste0("mash.sign<0=",round((sum(matrix.ash.norm.brain<0)/length(matrix.ash.norm.brain)),3)))
hist(b.sign,freq=FALSE,ylim=c(0,10),main="B_jr|Data/sign(B_jr[which.max(abs(B_jr|Data))],Z.raw)",nclass=50)
legend("left",legend=paste0("z.sign<0=",round((sum(bnorm<0)/length(bnorm)),3)))

Understandably, there are no examples in which the sign of the posterior mean using univariate ash is flipped because there is no sharing of information across tissues, while with matrix ash, the sign changes about 18% of the time:

b.stat=maxb
sum2=b.stat*pm.mash.beta
sum(sum2<0)/length(unlist(sum2))
## [1] 0.1792216

Let’s consider the biplots of the values vs their maximums: First, I plot the biplot of the Maximum values vs their Normalized using Posterior means computed with matrix ash, univariate ash, and univariate summary statistics.

alt text

alt text

alt text

alt text

alt text

And finally, we can compare the magnitude heterogeneity - that is, the number of tissues in which the eQTL has an activity greater than \(\alpha\) or (less than \(\alpha\)) of the maximum effect. Let’s compare using and excluding brains. We can see that excluding brains from our analysis tends to push our distribution towards homogeneity, in that a greater number of eQTL have a majoirty of their tissues close to the effect.

Now, let’s consider the number of tissues vs normalziing value. We plot the median of the value used to normalize for each group of SNPs that are of the same number of tissue class. We can see that the ‘brain homogenous effects’ (i.e., tissues in which the brain was chosen as the tissue of max effect) seem to have a lower median max effect, indicating that when brain is the highest, the other effects tend to be small.

ebnorm=het.norm(effectsize=pm.mash.beta.brain)
h=het.func(ebnorm,threshold=0.5)
max.mash.brain=apply(pm.mash.beta.brain,1,function(x){
(x[which.max(abs(x))])})
s=sapply(seq(1:44),function(x){
tissue.match=which(h==x)
median(abs(max.mash.brain[tissue.match]))})
   
plot(seq(1:44),s)

Now, let’s look at excluding brain:

ebnorm=het.norm(effectsize=pm.mash.beta)
h=het.func(ebnorm,threshold=0.5)
max.mash=apply(pm.mash.beta,1,function(x){
(x[which.max(abs(x))])})
s=sapply(seq(1:ncol(pm.mash.beta)),function(x){
tissue.match=which(h==x)
median(abs(max.mash[tissue.match]))})

plot(seq(1:ncol(pm.mash.beta)),s)

Let’s compare the empirical distribtuion with and without brains of the proportion of tissues which are greater than 50% of maximum effect:

require( Hmisc )
Ecdf(x = het.func(het.norm(effectsize=pm.mash.beta),threshold=0.5)/34,main="eCDF of Proportion of Tissues which are greater than 50% of maximum effect",col="blue",lty=1,xlab="Proportion of Tissues greater than 50% max effect")
Ecdf(x = het.func(het.norm(effectsize=pm.mash.beta.brain),threshold=0.5)/44,add = TRUE, col = 'red', lty = 1)
Ecdf(x = het.func(brain.norm,threshold=0.5)/10,add = TRUE, col = 'green', lty = 1)
legend("bottomright",legend=c("AllTissues","NoBrain","BrainOnly"),col=c("red","blue","green"),lty=c(1,1,1))

Let’s comapre the sum of normalized bstatistics across tissues.For each gene, compute sum (norm.b)^2 such that norm.b>0 (eliminate off diretion).

How do the tissues with high or low values of our heterogeneity index behave?

Let’s looks at how heterogeneity behaves. Here, we plot examples in which the vast majority of tissues have a heterogeneity index greater than 50%. We plot the E(Z|D) for those in which the QTL were ‘homogeneous’ in most:

h.all=het.func(normalized.all,threshold=0.5)## Normalize and determine heterogeneity based on E(b\Data) with negative values excluded
h.all=het.func(het.norm(pm.mash.beta.brain),threshold = 0.5)
library(qtlcharts)
rownames(pm.mash.all)=rownames(normalized.all)
iplotCurves(curveMatrix = pm.mash.all[h.all>40,],chartOpts=list(curves_xlab="Tissue", curves_ylab="E(Z|D)"))
## Set screen size to height=700 x width=1000

#iplotCurves(curveMatrix = pm.mash.all[hetindex>30,],chartOpts=list(curves_xlab="Tissue", curves_ylab="E(Z|D)"))
#iplotCurves(curveMatrix = pm.mash.beta.brain[h.all<2,])

Now, we look at eQTL in which the effects are consistent in only a few:

iplotCurves(curveMatrix = pm.mash.all[h.all<2,],chartOpts=list(curves_xlab="Tissue", curves_ylab="E(Z|D)"))

#iplotCurves(curveMatrix = pm.mash.all[hetindex<2,],chartOpts=list(curves_xlab="Tissue", curves_ylab="E(Z|D)"))
#iplotCurves(curveMatrix = z.stat[het.genes.with.brain<2,],chartOpts=list(curves_xlab="Tissue", curves_ylab="Z.statBrain"))

From this comparison we can also see that the non-homogenous eQTL tned to have smaller max effects.